Abstract

Distributed discovery service is a main concept in the scalable and dynamic grid environments. In this paper, based on the super-peer technique, we propose a new topology for the grid discovery service. The model is designed in such a way that each super-peer within the cluster has the routing indices (RIs) based on cobweb and uses the hop-count routing index (HRI) to select the best neighbor. Besides, each super-peer includes a cache table, which stores the query and the query results. Furthermore, from the point of view of the response time and the number of submitted messages, we compare the new model with an existing method. An illustrative simulation is also presented to show the efficiency and validation of the new technique.

1. Introduction

Grid systems generally solve the science and engineering problems in the large-scale environment and integrate the high volume of computing and storage resources, data, services, and applications that are distributed geographically [1]. Discovery service in grid is directly affected by the diversity, dynamic, heterogeneity, and distribution features, which can notably increase the grid efficiency [2]. The classical approaches that are utilized for grid discovery are based on the centralized [37] or hierarchical [8, 9] architectures. These methods collect the grid resource information and use the central index server to sustain them. The grid environments suffer from two main drawbacks: bottleneck and single point of failure [1012]. To avoid the mentioned problems, the P2P model has been adopted [3, 1315]. The P2P topology in grid environments consists of the structured and unstructured models [1623]. The super-peer mechanism is a category of P2P technology, which has been proposed in [24], for more details see [2528].

The modified breadth-first-search (BFS) and intelligent search (IS) methods have been introduced in [29]. Breadth-first-search [30] technique is an extended protocol of Gnuttela that is implemented locally and is based on keyword searching, which selects the neighbor peers randomly. In the IS method, first for each peer, a profile is built and then this profile is used for the sending query. Ghorbani et al. [31] presented a self-adaptive resource discovery in the unstructured P2P environment based on the feedback of the network nodes. They also used the learning automata (LA) algorithm [32] to educate peers in the discovery process for finding the best neighbor peer. Löser et al. [33], using the semantic clustering method [34, 35] in super-peer network, presented the semantic overlay clustering (SOC) to link the information provider peer to each cluster super-peer semantically.

Some approaches [2, 3639] also index and summarize the resource features to reduce the amount of information that is kept and transferred in the discovery service.

Routing indices (RIs) as a P2P technique organize the indexed and summarized resource information of nodes [40]. The benefit of this approach is that queries are disseminated and forwarded only among the locations of the network where resources existed, thus avoiding to flood query requests to the nodes which are not useful. The main drawback of this technique is that this indexing system comes from the presence of cycles in the network graph. As an extension of the RIs, hop-count routing index (HRI) is produced that keeps the resource information in a table structure at different hops [4143].

Marzolla et al. [44] proposed a discovery technique in which the resource information of each domain and the summarized information of the neighbor domains are maintained by the brokers. This method uses the k-bit vector to represent the indices and summarize the information of resources. Puppin et al. [41] implemented a grid information service (GIS) that utilized the HRI. This method has two significant entities: (1) the agent that is in control of a super-peer builds the node’s resource information and (2) the aggregator that receives the resource information from the mentioned agent and indexes it.

Caminero et al. [42] proposed the use of HRI as a means to route jobs within a P2P system. The main drawback of this model is that it only considers numeric parameters (such as effective bandwidth, the number of total machines in the domain, and the domain workload) to perform the resource discovery. Caminero et al. [43] presented a model that uses the HRI to construct the summary information based on cobweb tree [37]. This approach calculates the goodness function [40] for routing and forwarding the query to the neighbor peers, which likely have more probability for that query. Furthermore, to adapt the summaries with RIs, they proposed the technique so-called n-level summaries to filter attributes that have the same values, and their probability is less than a threshold.

In this paper, we present a new discovery method based on super-peer network in which each super-peer within the cluster has the routing indices (RIs) based on cobweb and uses the hop-count routing index (HRI) to select the best neighbor. Furthermore, each super-peer utilizes a cache table to store the query and the query results.

Remaining of this paper has been structured as follows: Section 2 expresses the configuration of the grid based on the super-peer network and Section 3 introduces the proposed resource discovery method in detail. Section 4 evaluates the simulation results based on the GridSim [45] that shows the effective performance of our approach. The final section concludes the paper and presents the guideline for the later works.

2. Configuration of Grid

Grid systems can use the super-peer topology for implementation of their infrastructure. As shown in Figure 1, the super-peers (SP) in each virtual organization (VO) or cluster operate as a server. Super peers receive the query from the client peer (CP) or the neighbor super-peers and answer them. The SPs communicate with each other and make an overlay network [46]. Generally, an SP has high capability among other CP in each VO/cluster that processes the query. When a CP connects to a VO/cluster, the local SP indexes the information of its resources. At leaving, the indexed information related to that CP is removed from the VO/cluster. In addition, if the resource information of a CP is changed, the updated information is sent to the mentioned SP. When a CP needs resources, it creates a query and submits it to the SP. This SP seeks among indexed information to find appropriate resources. If the requested resources are found, the IP address of a CP, that is, the owner of the result, is sent to the requested peer; otherwise, the SP creates a copy of the request and forwards it to the overlay network. The SP of each cluster searches its domain to find the requested resources.

3. New Discovery Method

We present a new discovery technique based on the super-peer network. First, we create some clusters and organize the grid nodes. In each cluster, a super-peer maintains the summary of resource information that is owned by client peers. We use cobweb [3739] to cluster and summarize the resource information. Each super-peer uses the RIs [40] to structure and keep the summary information of peers within the cluster. Table 1 is a sample of RIs based on Figure 2 that shows attributes and their probability values provided by the cobweb in each cluster. When a peer joins or leaves the cluster, the RIs entry related to it is created or deleted. Each cluster super-peer sends the maximum probability of each attribute to the neighbor super-peers. When the resource information of a peer changes, that peer sends the updates to its super-peer. Then the super-peer creates the summary again and updates the probabilities of RIs which has higher than the threshold. Next, the maximum probability of attributes is resubmitted to the neighbor super-peers. The super-peers use HRI [41, 43] to organize the summary information of neighbor super-peers in the grid environment. Table 2 shows the aggregated probabilities of the neighbor super-peers at different hops. It is noteworthy that the neighbor super-peers are selected based on the HRI information and not by random or flooding methods. This selection is based on goodness function [40] at the predefined hop-count. Consequently, the queries are forwarded to super-peers that are likely to match the query requirements and have nodes in their cluster that own the requested resources.

The goodness of neighbor super-peer () with respect to the query requirement is calculated as follows:where is the -th neighbor super-peer; is the horizon of HRI (number of hops); refers to the HRI entry for at -th hop; is the requested requirement; and points to the probability of a neighbor super-peer in the position of the HRI table with respect to . The cluster super-peer selects the best neighbor super-peer based on the maximum result of the goodness function and forwards the query to it.

Moreover, in our approach, each super-peer caches the submitted queries and their results in a cache table. The cache table can improve the grid performance by decreasing the delay, bandwidth usage, traffic, and the number of messages, which are sent in the discovery process. Table 3 is the cache table of a super-peer that contains the requirements of three queries in the form of attributes and the address of nodes that are the owner of requirements (result nodes). Each entry in this table is made in the local domain or is sent by the neighbor super-peers in other clusters.

3.1. Discovery Algorithm

In Figure 2, consider the cluster 1; let P1 need some computing resources to execute its project. The cluster super-peer receives a query and searches the local domain. If the responding peer finds, the super-peer sends the address of the result node back to the requested node and inserts an entry for that query or updates its cache table. Furthermore, it sends the update to the neighbor super-peers. If no results are found, the super-peer looks up the cache table before forwarding the query. If there is a related entry for that request, the super-peer answers the query locally and returns the query result to the requesting node. If not, the super-peer of the cluster 1 calculates the goodness function of its neighbor super-peers (cluster 2 and cluster 3) for that query. We assume that the goodness function of cluster 2 is higher than cluster 3. Therefore, the super-peer of cluster 2 first seeks locally to find the result and then looks up the cache table. If the query result is not in the cache table of cluster 2, the first best neighbor (say cluster 4) is selected and the query is forwarded to it. If the finding process is not successful, the query is bounced back to the parent super-peer (cluster 2). Then, cluster 2 sends the query to the second best neighbor (cluster 5). If the search process is unsuccessful and as a result of cluster 2 has no other neighbor domain except cluster 1, the query is bounced back and cluster 1 forwards query to cluster 3 (the second best neighbor). When the algorithm reaches a response, the cache table is updated and forwarded to the neighbor clusters. Algorithm 1 shows our discovery approach.

(1)q: new incoming query
(2)ClusterResource: a resource in cluster
(3)Cache: a query result in cache table
(4)BestNeighbor: a neighbor super‐peer selected by goodness function
(5)Neighbor: next neighbor super‐peer
(6)for incoming q do
(7)ClusterResource = MatchQueryClusterResource (q)
(8)if (ClusterResource = = null) then
(9) Cache = MatchQueryCache (q)
(10)  if (Cache = = null) then
(11)  BestNeighbor = HRI (q, Neighbor)
(12)   if (BestNeighbor = = null) then
(13)   Receiver = Sender (q)
(14)   else
(15)   Receiver = BestNeighbor
(16)   end if
(17)  ForwardQueryToReceiver (q, Receiver)
(18)  else
(19)  SendResponseToRequester (q)
(20)  end if
(21)else
(22)SendResponseToRequester (q)
(23)Store/UpdateResultToCache (q)
(24)Send update to Neighbor
(25)end if
(26)end for
3.2. Stability of New Approach

In this section, we compare our model with the existing model in [43]. To this end, consider Figure 3. This is a sample of P2P topology based on the model in [43], in which each peer is responsible for the discovery process. We can see in the mentioned model in [43] that there is a tree structure with 3-hop-count deep and 15 connected peers. For using our model, first, we eliminate all connections and reconfigure this topology based on super-peer as shown in Figure 4. This figure shows that there are three clusters in topology with 1-hop-count deep. The new configuration increases the scalability of discovery as shown in Figure 3.

Consider Figure 3; let a user in the domain of P0 send a query, and the node P0 searches locally to response it. If there is no required resource, P0 must send the query to the best neighbor peer. We consider two scenarios (optimistic and pessimistic statuses).

In the optimistic status, the response node is at 1-hop-count level (P1 or P2). We assume that P1 is the best neighbor for P0, then P0 sends the query to P1 and the discovery process successfully finishes. In the pessimistic status, P1 receives the query from P0 and searches locally. If P1 cannot find the requested resources, the query is sent to the best neighbor (say P3). We assume that node P3 is not the owner of the needed resources, and then it sends the query to the best neighbor peer (say P7). If P7 cannot respond to the request, the query is bounced backed. Then P3 sends the query to the second best neighbor (P8). We assume that P8 is not the owner of the needed resource, and then the query is bounced back to P1. P1 sends the query to the second best neighbor (P4), and P4 sends it first to P9, and if there is no response, then it submits it to the P10. Let the requested resources are not found in P10, so the query is bounced back to P0. Similarly, for the left side of P0, in the pessimistic status, we through the following routes P2 ⟶ P5 ⟶ P11 ⟶ P12 ⟶ P6 ⟶ P13 ⟶ P14. Therefore, considering P14 has the requested resources, the discovery process of the method [43] can be finished.

Next, consider Figure 4; let us assume that a user in cluster 1 sends a query to the cluster super-peer (SP1). The query process is begun, and the SP1 searches its domain locally. If it finds the requested resource, obviously the process is finished. Except for this situation, we consider two scenarios (optimistic and pessimistic statuses). Let SP1 forwards the query to the best neighbor (say cluster 2); for the optimistic status, the discovery process in cluster 2 is successful and the response node is found. In the pessimistic status, the search in cluster 2 is not successful, so the query is bounced back and forwarded to cluster 3. If this cluster finds the requested recourse, the discovery process is finished successfully. It is well-known that the resource discovery scalability is related to the cost of the discovery process in terms of time and the query messages that are sent to reach the response. Therefore, based on the above demonstration, it is easy to see that the new approach is superior to the existing model in [43] from the scalability point of view. In the next section, we will simulate and compare the above two topologies (Figures 3 and 4) based on the scalability feature.

4. Simulation Results

In this section, we simulate the discovery process in the grid system. First, we simulate the discovery process based on the new method (Algorithm 1) and evaluate the query response time and the submitted messages in comparison with the discovery approach presented by Caminero et al. [43]. Second, we compare the scalability of discovery related to the two topologies presented in Section 3.2. Furthermore, the grid environment has been simulated by using the GridSim toolkit [45].

Suppose that 100 users corporate in our simulation environment; the connection bandwidth is 100 Mbs; the propagation delay is 10 second, and the packet are transmitted by 1500 packet/sec maximally. The resources contain one machine that each of them has four processing elements and “Intel” architecture, and their operating systems are “Linux.” The network routers use the RIP protocol, and their scheduling method is FIFO. The first part of our simulation consists of three scenarios that the responding cluster selected randomly: (1) the result node is in the neighbor cluster at 1-hop count, (2) the result node is in the neighbor cluster at 2-hop count, and (3) the result node is in the neighbor cluster at 3-hop count. The second part of the simulation consists of two scenarios: (1) the result node is found in the optimistic status and (2) the result node is found in the pessimistic status.

Figures 5 and 6 compare the query response time and the number of forwarded discover messages in two grid environments: (1) the grid environment simulated based on the new model in which all super-peers use the cache table to response the queries and (2) the grid environment simulated with the model in [43].

The simulation results show that the using of the cache table improves the query response time and decreases the discovery delay. Furthermore, the results illustrate that the number of submitted query messages also decreases. We can see that when the number of hop count increases, the result of the new model is much better than the existing model in [43].

In the second part, the discovery processes are simulated in the optimistic status and the pessimistic status with two mentioned topologies. The simulated results show that using of new topology compared with the existing topology, in both statuses (optimistic status and pessimistic status), decreases the discovery time and improves the scalability of grid discovery (Figure 7).

5. Conclusions and Future Works

This paper introduced a new model of grid discovery using P2P technology. For improvement of the scalability of the discovery process, the new model was configured based on the super-peer approach. Furthermore, to reduce the discovery delay, a cache table has been used in the model. The simulated results show that the new model compared with the existing method optimized the query response time and decreased the submitted query messages. For the future, we intend to use some intelligent clustering approaches in this topology and choose the super-peers based on the network attributes such as traffic and bandwidth of the network.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.